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ABSTRACT 

While application software does the real work, domain-specific languages (DSLs) are tools to help produce it efficiently, and language 
design assistants in turn are meta-tools to help produce DSLs quickly. DSLs are already in wide use (HTML for web pages, Excel macros 
for spreadsheet applications, VHDL for hardware design, . . .), but many more will be needed for both new as well as existing application 
domains. Language design assistants to help develop them currently exist only in the basic form of language development systems. 
After a quick look at domain-specific languages, and especially their relationship to application libraries, we survey existing language 
development systems and give an outline of future language design assistants. 
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1 Domain- Specific Languages 

Many computer languages are domain- specific rather than 
general purpose. Domain-specific languages (DSLs) are 
also called task- specific, application- oriented, or problem- 
oriented. Some well-known DSLs with their application 
domains are listed in Table |l|. So-called fourth- generation 
languages (4GLs) are usually DSLs for database applica- 
tions. A good reference, which focuses on the key role of 
DSLs in end user programming, is JTsfl . A recent DSL bib- 
liography is p . 

We will not try to give a definition of what constitutes 
an application domain and what does not. Some people 
consider Cobol to be a DSL for business applications, while 
others would argue this is pushing the notion of application 
domain too far. Leaving matters of definition aside, it is 
natural to think of DSLs in terms of a gradual scale with 
very specialized DSLs such as HTML (Table [l]) on the left 
and general purpose programming languages such as CH — h 
on the right. On this scale, Cobol would be somewhere 
between HTML and C++, but much closer to the latter. 

In combination with an application library, any general 
purpose programming language can act as a DSL, so why 
were DSLs developed in the first place? Simply because 
they can offer domain-specificity in better ways: 

• Appropriate or established domain- specific notations 
are usually beyond the limited user-definable opera- 
tor notation offered by general purpose languages. A 
DSL offers domain-specific notations from the start. 
Their importance cannot be overestimated as they are 
directly related to the suitability for end user program- 
ming and, more generally, the programmer productiv- 
ity improvement associated with the use of DSLs. 

• Appropriate domain- specific constructs and abstrac- 
tions cannot always be mapped in a straightforward 
way on functions or objects that can be put in a li- 
brary. This means a general purpose language using an 
application library can only express these constructs 
indirectly. Again, a DSL would incorporate domain- 
specific constructs from the start. 

Nevertheless, application libraries are formidable competi- 
tors to DSLs. For once, designing and implementing a DSL 
is far from easy. A domain expert is usually not an expert 
in language design, which is a distinct (meta-)domain of ex- 
pertise in itself. How this situation can be improved is the 
main topic of this article, but even with improved DSL de- 
velopment tools, application libraries will remain the most 
cost-effective solution in many cases. 

There are other factors complicating the relative merits 
of DSLs and application libraries. A case in point is Mi- 
crosoft Excel. Its macro language is a DSL for spreadsheet 
applications which adds programmability to Excel's funda- 
mental interactive mode. Using COM, Microsoft's software 
component technology, Excel's implementation has been 
restructured into an application library or tool box of COM 



Table 1: Some widely used domain-specific languages. 



DSL 


Application domain 


BNF 

Excel macro language 

HTML 

IATgX 

Make 

SQL 

VHDL 


Syntax 
Spreadsheets 
Hypertext web pages 
Typesetting 
Program maintenance 
Database queries 
Hardware design 



components. This has opened it up to general purpose pro- 
gramming languages such as CH — h, Java and Basic, which 
can access it through its COM interfaces. This is called 
Automation and is described in more detail in Unlike 
Excel macro language, which by its very nature is limited 
to Excel functionality, general purpose programming lan- 
guages are not. They can be used to write applications 
transcending Excel's boundaries by using components from 
other "automated" programs and COM libraries in addi- 
tion to components from Excel itself. 

2 Existing Language Development 
Systems 

As noted, DSL development is hard, requiring both domain 
knowledge and language development expertise. The de- 
velopment process can be speeded up by using a language 
development system. Some representative ones are listed 
in Table ^|. They have widely different capabilities and are 
in widely different stages of development, but are based 
on the same general principle: they generate programming 
tools from language descriptions. 

The input to these systems is a description of various 
aspects of the DSL to be developed in terms of special- 
ized meta-languages. Some important language aspects are 
listed in Table [| It so happens that the meta-languages 
used for describing these aspects are themselves DSLs for 
the particular aspect in question. For instance, an impor- 
tant language aspect is syntax, which is usually described 
in something close to BNF, the well-known DSL and de 
facto standard for syntax specification (Table . The cor- 
responding tool generated by the language development 
system is a parser. 

The tool generation capabilities of the language develop- 
ment systems listed in Table |^ are shown in Table |]. All 
of them can generate lexical scanners, parsers, and pret- 
typrinters, many of them can produce syntax-directed edi- 
tors, typecheckers, and interpreters, and a few can produce 
various kinds of software renovation tools. These tools are 
as useful for DSLs as they are for programming languages. 

Although the various specialized meta-languages used for 
describing language aspects differ from system to system, 
they are usually rule based. For instance, depending on 
the system, the typechecking of language constructs has to 
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Table 2: Some representative language development sys- 
tems. 



System 


Developed at 


ASF+SDF Meta-Environ- 

ment Q 

Centaur [|3| 

EU|| 

Gem-Mex § 

PSG § 

Software Refinery jl(| 
Synthesizer Generator 


CWI and University of 
Amsterdam 

INRIA Sophia- Antipolis 
University of Paderborn 
University of L'Aquila 
Technical University of 
Darmstadt 

Reasoning Systems, Palo 
Alto 

Cornell University 



Table 3: Some language aspects. 



Syntax 
Prettyprinting 
Typechecking 
Interpretation 
Translation 
Debugging 



be described in terms of attributed syntax rules (an exten- 
sion of BNF) , conditional rewrite rules, inference rules, or 
transition rules. 

Some examples of DSL development using a language 
development system are given in Table |. The Box pret- 
typrinting meta-language is an example of a DSL devel- 
oped with a language development system (in this case the 
ASF+SDF Meta-Environment) for later use as one of the 
meta-languages of the system itself. Risla is a language for 
describing loans and mortgages offered by banks. Its con- 
structs are described in terms of their translation to Cobol. 
The LaCon system is not listed among the language devel- 
opment systems in Table ||, but will be discussed in the 
next section. 



3 Toward Language Design Assis- 
tants 

Actually, the language development systems listed in Ta- 
ble H incorporate little language design knowledge. Their 
main assets are the meta-languages they support, and in 
some cases a meta-environment to aid in constructing and 
debugging language descriptions. Even though tailored 
toward the language aspect they have to describe, these 
meta-languages are often hard to use. In many cases they 
were selected primarily for their favorable mathematical or 
logical properties rather than their user- friendliness. Also, 
these properties tend to become less important when lan- 
guage descriptions become large. 

To turn language development systems into true lan- 
guage design assistants (LDAs) 11 1, improvements can be 



sought in various directions: 

• Incorporation of language concepts and design rules. 

• Visual or semi-visual meta-languages. The latter are 
partly graphical and partly textual. 

• Description by example of some (necessarily limited) 
language aspects such as prettyprinting or syntax. 

Clearly, these improvements are not simple. LDAs will 
become far more complex than language development sys- 
tems, which are not particularly simple to begin with. For- 
tunately, LDAs may be approached step by step. Also, not 
all of the above improvements are completely new: 

• The LaCon system (Table ||) allows domain experts to 
compose elements and properties of a DSL by simple 
yes/no decisions. It automatically checks consistency 
of user decisions, computes their consequences, and 
provides design style advice. To generate an imple- 
mentation, it uses the Eli language development sys- 
tem (Table |2|) as back-end. 

• The Gem-Mex system (Table |^) already supports a 
semi- visual notation for the transition rules it uses to 
define the typechecking, interpretation and translation 
of language constructs. 

• Less concretely, no longer pursued but nevertheless in- 
teresting plans for the Language Development Labora- 
tory included a library of reusable language constructs, 
a knowledge base containing knowledge of languages 
and their compilers / interpreters , and a tool for lan- 
guage design Jn], [TsJ . 

In the remainder of this section we focus on the incorpo- 
ration of language design knowledge and on description by 
example of selected language aspects. 

3.1 Incorporation of Language Concepts 
and Design Rules 

Basically, the language designer using an LDA picks suit- 
able language building blocks from the language knowledge 
base (language library), customizes them, and composes 
them into larger and larger language fragments. It may 
be necessary to add entirely new building blocks and con- 
cepts in the process, especially domain-specific ones, since 
it cannot be expected that everything required is already 
present in customizable form. The LDA provides feedback 
during customization and composition. Finally, the fin- 
ished design is implemented by a language development 
system that serves as back-end to the LDA. 

More specifically, the main elements and notions that 
would play a key role in an LDA are: 

• Language concept Some examples are given in Ta- 
ble Q. They are rather diverse. For instance, some 
language concepts, such as "statement" , "expression" , 
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Table 4: Tool generation capabilities of representative language development systems. 



System 


Generated tools 


ASF+SDF Meta-Environment 


Scanner/parser (generalized LR), prettyprinter, syntax-directed 
editor, typechecker, interpreter, origin tracker, translator, renova- 
tion tools, . . . 


Centaur 


Scanner/parser (LALR), prettyprinter, syntax-directed editor, 
typechecker, interpreter, origin tracker, translator, . . . 


Eli 


Scanner/parser, typechecker, interpreter, translator, . . . 


Gem-Mex 


Scanner/parser, typechecker, interpreter, translator, debugger, . . . 


PSG 


Scanner/parser, syntax-directed editor, incremental typechecker 
(even for incomplete program fragments), interpreter 


Software Refinery 


Scanner/parser (LALR), prettyprinter, syntax-directed editor, 
object-oriented parse tree repository (including dataflow rela- 
tions), Y2K/Euro tools, program sheer, . . . 


Synthesizer Generator 


Scanner/parser (LALR), prettyprinter, syntax-directed editor, in- 
cremental typechecker, incremental translator, . . . 



Table 5: Examples of DSL development using a language development system. 



DSL 


Application domain 


System used 


Box H 


Prettyprinting 


ASF+SDF Meta-Environ- 






ment 


Cubix p| 


Virtual data warehousing 


Gem-Mex 


Risla § 


Financial products 


ASF+SDF Meta-Environ- 






ment 


(Various) 


Data model translation 


LaCon @ 



Table 6: Some language concepts. 



syntax 


sentence 


abstract syn- 
tax 


signature 


term 






program 


procedure 


function 


statement 


expression 


assignment 


goto 


conditional 


loop 


exception 
handler 


input 


output 


operand 


value 


type 


subtype 


inheritance 


strong typ- 
ing 


overloading 


polymorphism 


untyped 


record 


class 


subclass 


template 
class 


object 


module 


parameterized 
module 


import 


export 


actualization 


instantiation 


external 


library 




concurrency 


thread 


process 


exception 


overflow 






scope 


block 


static scope 


dynamic 
scope 


local variable 


global vari- 
able 


static vari- 
able 


hie 


parameter 


argument 










state 


heap 


call stack 


stack frame 


side-effect 


data flow 


control flow 


typechecking 


interpretation 


translation 


transforma- 
tion 


abstract in- 
terpretation 


data flow 
analysis 


control flow 
analysis 
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or "loop" , correspond more or less directly to language 
constructs. These are language building blocks (see be- 
low). Other ones, such as "scope" or "side-effect" have 
the character of an attribute to a language building 
block, while "state" or "call stack" refer to the dy- 
namic behavior of programs or to the language's im- 
plementation. Note that some of the concepts in the 
bottom row correspond to language aspects mentioned 
in the previous section. 

A more precise classification of language concepts from 
the perspective of their use in an LDA is currently 
being attempted. Some concepts may turn out to be 
more useful than others. 

• Language building block Language concept corre- 
sponding more or less directly to a language construct, 
such as "statement", "expression", or "loop". It may 
have many attributes, which themselves correspond 
to language concepts of a different kind, such as (ab- 
stract) syntax, typechecking, interpretation, data and 
control flow, side-effects, exceptions, among others. 

• Relation Relation between language concepts. May 
itself be a (higher-order) language concept. Attributes 
like "scope" and "side-effect" are unary relations. "Im- 
plementation" would be an example of a binary one. 

• Language knowledge base Knowledge base of lan- 
guage concepts and their relations. It may include the- 
ories of the concepts it contains. These may range 
from rudimentary to elaborate. 

• Customization The process of adapting language 
building blocks during the language design process by 
means of instantiation, transformation, or generation. 

• Composition (Partly) customized language building 
blocks can be composed into larger language frag- 
ments. In this way the DSL is constructed step by 
step. 

• Constraint checking is triggered by customization 
and composition to check, at least to some extent, the 
validity of the resulting language building block. The 
constraint checker is a parameter of the LDA design. 

• Design language Meta-language allowing the user 
to formulate language design questions and interro- 
gate/browse the language knowledge base. 

• Language knowledge representation language 

Visual or semi-visual meta-language to express lan- 
guage concepts and their relations. It has well-defined 
sublanguages to express language building blocks and 
their attributes. These sublanguages are compiled to 
the corresponding meta-language(s) of the language 
development system that is used as a back-end. 

• Language development system Back-end of the 
LDA and parameter of the LDA design. The LDA 



facilitates the language description process and then 
uses a language development system to generate the 
tooling from the finished description. As noted, LaCon 
uses Eli as back-end. In our case, the ASF+SDF Meta- 
Environment (Table |J) will be the back-end. Many of 
the other systems mentioned in the previous section 
would be suitable as well. 

3.2 Description by Example of Selected 
Language Aspects 

For some language aspects description by example (DBE) 
may become a user-friendly alternative to (or addition 
to) exhaustive description in terms of a specialized meta- 
language. We should not be too ambitious. DBE of the 
interpretation or translation of a language is not realistic, 
but prettyprinting or syntax may be sufficiently limited for 
DBE to become useful. 

In fact, prettyprinting is a special case of text format- 
ting. DBE of the latter is supported to some extent by 
Microsoft Word, which is capable of defining "styles" from 
user supplied examples. Another system supporting for- 
matting by example is Tourmaline p7[ . It contains rules 
that try to determine the role of different parts of a header 
in a text document, such as section number, title, author, 
and affiliation, as well as the formatting associated with 
each part. The results are displayed in a dialogue box for 
the user to inspect and correct. 

DBE of syntax for the purpose of language develop- 
ment was already suggested in [[|, but to the best of our 
knowledge not put into practice. Since then, considerable 
progress has been made both in the theory of syntax infer- 
ence from example sentences as well as in the computing 
power that can be brought to bear on the inference process. 

The user-friendliness of DBE is due to the fact that 
examples of intended behavior do not require a special- 
ized meta-language, or only a small part of it. For in- 
stance, prettyprinting by example would try to infer gen- 
eral prettyprinting rules from a test suite of prettyprintcd 
constructs or programs in the language under develop- 
ment. The inferred rules might be expressed in a language 
like Box (Table ||), the prettyprinting meta-language of 
the ASF+SDF Meta-Environment, but the test suite itself 
would not require this language. 

The inference mechanism used in the implementation of 
DBE may range from very limited and predictable gener- 
alization, hardly deserving to be called inference, to full- 
fledged inductive inference. The latter has a probabilistic 
character in the sense that the inferred generalization is 
in some sense the simplest one that is correct on the ex- 
amples given to it, but it need not be the one intended 
by the user. To find out, the user may have to inspect 
the generated rules, which is rather unattractive. And if 
the generalization turns out to be incorrect, it may again 
be hard to find out which examples have to be added or 
changed. 

These are largely unsolved problems, partly offsetting 
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the advantages of the stronger forms of DBE suffering from 
them, so simple, predictable DBE is to be preferred. 
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